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Although traditional clustering nnethods (e.g., K-means) have been shown to be useful 
in the social sciences it is often difficult for such methods to handle situations where 
clusters in the population overlap or are annbiguous. Fuzzy clustering, a method already 
recognized in many disciplines, provides a more flexible alternative to these traditional 
clustering methods. Fuzzy clustering differs from other traditional clustering methods in 
that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy 
clustering techniques remain relatively unused in the social and behavioral sciences. The 
purpose of this paper is to introduce fuzzy clustering to these audiences who are currently 
relatively unfamiliar with the technique. In order to demonstrate the advantages associated 
with this method, cluster solutions of a common perfectionism measure were created 
using both fuzzy clustering and K-means clustering, and the results compared. Results of 
these analyses reveal that different cluster solutions are found by the two methods, and 
the similarity between the different clustering solutions depends on the amount of cluster 
overlap allowed for in fuzzy clustering. 
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INTRODUCTION 

Clustering is a common method used in the psychological, social, 
and physical sciences to identify subgroups or profiles of indi- 
viduals within the larger population who share similar patterns 
on a set of variables. Traditional methods of clustering (e.g., 
K-means) attempt to place each individual case into a cluster 
with other observations with which it shares a similar score 
pattern (Everitt et al., 2011). Such traditional hard clustering 
methods allow an individual to belong to only one cluster. Such 
an approach also ignores the fact that an individual may share 
traits with multiple subgroups in the population, and thus poten- 
tially belong to more than one such cluster. The purpose of 
this study is to showcase the use of a soft clustering technique, 
fuzzy clustering, that is currently underutilized in the social 
sciences. Unlike traditional hard clustering methods, fuzzy clus- 
tering allows for individual cases to simultaneously belong to 
more than one cluster, thus having the potential to inform not 
only the cluster with which a case has the strongest member- 
ship but also how each case is related to each of the clusters 
(Everitt et al., 2011). As a result, fuzzy clustering can provide the 
researcher with a more realistic picture of subgroups and sub- 
group relations within the population. Rather than assuming that 
an individual is only a member of a single subgroup, allowing the 
individual to share membership in multiple clusters reflects the 
reality that such membership does not need to be an either/or 
proposition (Gan et al, 2007). Thus, fuzzy clustering has the 
potential to provide more information about the structure of the 
data than other clustering methods (Kaufman and Rousseeuw, 
2005). 



This paper provides a comparison of clustering solutions based 
on the traditional K-means and fuzzy clustering approaches using 
the same data set in order to demonstrate the similarities and dif- 
ferences between the techniques and showcase the utUity of the 
unique features associated with fuzzy clustering. This comparison 
wiU be done using a data set measuring aspects of perfectionism in 
a college undergraduate sample. The perfectionism data was cho- 
sen both to appeal to the intended social science audience for this 
study as well as to help add to the growing discussion of a group 
based perfectionism orientation. The following sections provide 
a description of the data and research question this data set was 
attempting to answer. 

THE FIELD OF PERFECTIONISM AND NEED FOR CLUSTERING 
RESEARCH 

Perfectionism is generally defined as a condition in which the 
individual holds excessively high personal standards with a ten- 
dency toward overly critical review of personal achievements 
and behaviors (Stoeber et al, 2009). Originally viewed as a 
singular dimension that was deleterious to optimal function- 
ing, Haniachek (1978) introduced a line of inquiry that has 
dominated perfectionism research in the past 35 years identi- 
fying both "normal" and "neurotic" perfectionism. Since the 
1990's, there has been universal agreement that perfectionism 
is a multidimensional construct, with multiple measures con- 
structed to assess these factors, including the Multidimensional 
Perfectionism Scale (Hewitt and Flett, 1991), Almost Perfect 
Scale (Slaney et al., 2001), and the Frost Multidimensional 
Perfectionism Scale (FMPS) (Frost et al, 1990). 
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MULTIDIMENSIONAL ORIENTATION OF PERFECTIONISM 

While the items and eventual factor structure for each scale differ, 
the underlying conclusions of the research in the field confirms 
essentially similar patterns of responses, with both positive (e.g., 
high personal standards, organization) and negative aspects (e.g., 
elevated self-criticism, susceptibility to external pressures) of per- 
fectionism being identified (e.g., Stoeber and Otto, 2006). The 
FMPS has been perhaps the most commonly studied set of per- 
fectionism items and originally identified a six-factor solution to 
the 35-item scale. While several studies have used the FMPS and 
provided strong validation for the scale and a multidimensional 
nature for perfectionism, there have been multiple alternative rep- 
resentations for the construct (Stoeber, 1998; Purdon et al., 1999; 
Harvey et al., 2004). The various factor solutions for the FMPS 
provide ample opportunity to analyze a pattern of performances 
in the normal population. However, in a systematic compari- 
son of the factorial representations of the FMPS, Harvey et al. 
(2004) provided compelling evidence that their four-factor solu- 
tion was durable, explained the variance effectively and captured 
the representations offered by other research teams. Their recon- 
ceptualization of the 35-item scale produced the following four 
factors (a) Negative Projections — items addressing the tendency 
to make social comparisons and hold self-doubt over compe- 
tence; (b) Achievement Expectations — items addressing holding 
high personal standards and ego involvement goal orientation; 
(c) Parental Influences — items addressing parental influences and 
reactions to performance; and (d) Organization — consistently 
identified in other factor solutions for the FMPS that identify 
tendencies toward organization and neatness. Their analysis for 
this new factor structure showed theoretical similarity to Stoeber's 
(1998) four- factor structure, but demonstrated a better fit to the 
data and strong construct validity with the original six-factor 
solution (Frost et al, 1990) upon which the scale was created. 

GROUP-BASED ORIENTATION 

An alternative approach to examining perfectionism in learners 
has been to adopt a group-based or individualistic orientation, 
where the focus is on constructing perfectionism profiles based 
on responses to one of the primary assessment tools (Stoeber 
and Otto, 2006). The predominant approach to reviewing per- 
fectionism through a group-based orientation has been to use 
cluster analysis to generate the profiles of perfectionism identi- 
fied in the response data (e.g., Parker, 1997; Rice and Dellwo, 
2002; Grzegorek et al., 2004; Ashby and Bruner, 2005; Gilman 
et al, 2005; Mobley et al., 2005). As with the multidimensional 
orientation, research into the group-based view of perfectionism 
has generated several alternative conceptualizations for "types" 
of perfectionism (e.g., Parker, 1997; Rice and Dellwo, 2002; 
Grzegorek et al., 2004; Ashby and Bruner, 2005; Gilman et al, 
2005; Mobley et al, 2005). Stoeber and Otto's (2006) review of the 
extant research revealed the bulk of group-based perfectionism 
research can be summarized rather effectively by reviewing the 
presence of two dimensions of perfectionism: evaluative concerns 
and personal standards. In their proposed tripartite framework 
to explain the various research, non-perfectionists were identi- 
fied as those with low levels of personal standards perfectionism 
(regardless of evaluative concerns). For those with high levels 



of personal standards, individuals with low evaluative concerns 
were classified as "healthy perfectionists" and those with high 
evaluative concerns were classified as "unhealthy perfectionists." 
Gaudreau and Thompson (2010) proposed an alternative model 
based on this same framework, suggesting that the tripartite 
framework may be an incomplete representation of dispositional 
perfectionism. In particular, Gaudreau and Thompson (2010) 
proposed a 2 x 2 model — identifying individuals who were (a) 
non-perfectionists, (b) pure personal striving perfectionists, (c) 
pure evaluative concerns perfectionists, and (d) a "mixed" per- 
fectionist who holds both high personal standards and evaluative 
concerns. The difference in these two models is the addition in 
the 2x2 model of the group of perfectionists with only personal 
standards perfectionism (no evaluative concerns). 

Two key questions arise when reviewing the debate regard- 
ing the Gaudreau and Thompson (2010) and Stoeber and Otto 
(2006) representations for dispositional perfectionism. The first 
is whether the individuals with characteristically low levels of 
personal standards perfectionism can be split into two groups 
(Gaudreau, 2013). The second is a fundamental issue of whether 
each cluster is a distinct group with clear differentiation. That 
is, in both models there is the typical assumption that the sep- 
arate clusters do not overlap, capturing distinct representations 
of "types of perfectionists." This study takes on both of these 
questions by using perfectionism data to compare two different 
clustering approaches and showcase the potential benefits of the 
fuzzy clustering approach while also attempting to add to the 
perfectionism profile literature. 

CLUSTERING METHODS 

As demonstrated above, research into group-based orientation is 
commonly assessed using K-means clustering. While this cluster- 
ing method has been shown to be useful and effective it does not 
allow researchers to account for overlap among the clusters. In 
order to address the issue of overlap, we propose the use of fuzzy 
clustering. The following section provides descriptions of both 
the K-means and fuzzy clustering algorithms, highlighting their 
similarities and differences. 

K-MEANS CLUSTERING 

K-means clustering is a common centroid based clustering 
method that identifies a specified number of non-overlapping 
clusters within data (Gan et al., 2007). It requires the researcher to 
pre-specify the number of clusters and then places each individual 
into one of them. It should be noted that the actual profile (i.e., 
means on the variables used to cluster) of the clusters is not pre- 
specified, but only the number. The K-means clustering algorithm 
is based on the following steps. 

(1) The researcher indicates the number of clusters. 

(2) Initial cluster centroids are formed either by using random 
selection for the K clusters, or through pre-specification of 
cluster centroids by the researcher. 

(3) The squared Euclidean distance (ESS) is calculated based on 
the current cluster solution. 

(4) Each individual is reassigned to the cluster to whose centroid 
it is closest. 
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(5) The cluster centroids are updated after each reassignment. 

(6) Steps 3-5 are repeated until no further reassignment of indi- 
viduals to clusters takes place, i.e., each individual is in the 
cluster with the nearest centroid. 

ESS is expressed as (Izenman, 2008): 
K 



£SS = ^ ^ {Xi - XkY (x, - Xk) 

k=lc(t) = k 



(1) 



where K is the number of pre-specified clusters, x^ is the centroid 
for cluster k, is a vector of scores on the variables used to cluster 
individual and c(i) is the cluster containing the individual. The 
ESS is calculated for each iteration of the process described above, 
until all reassignments are completed, and ESS itself is minimized. 
When such convergence is reached, the researcher then examines 
the resultant clusters in order to determine whether they are sub- 
stantively meaningful and clearly distinct based upon the pattern 
of means on the variables used to cluster, as well as other variables 
that are hypothesized to differ among the clusters. By defini- 
tion this latter step in the clustering process involves subjective 
judgment on the part of the researcher. 

FUZZY CLUSTERING 

Fuzzy clustering is an extension of the traditional K-means 
algorithm. However, unlike K-means clustering, fuzzy clustering 
focuses on cluster membership based on fuzzy set theory (Everitt 
et al, 2011). Given this paradigm, fuzzy clustering allows indi- 
viduals to have multiple cluster memberships, thereby providing 
useful information about the degree of cluster overlap in the pop- 
ulation, as well as information about the relative membership of 
each individual within each cluster. Thus, in fuzzy clustering each 
case is allowed (but not required) to have partial membership in 
multiple clusters. For example, cluster membership for a hypo- 
thetical case might exhibit the following pattern: the individual 
has a 56% membership share in cluster 1, a 32% share in clus- 
ter 2, and a 12% share in cluster 3. As implied in this example, 
the degree to which a case belongs to a certain cluster is indi- 
cated by its membership share, which ranges from 0 to 1 (i.e., it 
is the proportion of the case that belongs to the cluster; Guldemir 
and Sengur, 2006). The algorithm for fuzzy clustering is based 
on minimizing the following objective function, as described by 
Kaufman and Rousseeuw (1990): 



K 

E 

k=l 



ik jk V 



2E 



(2) 



Here, k is as defined above. In addition, M,i- is a membership 
coefficient reflecting the membership share for observation i in 
cluster k. For a given individual, Eit = i = 1 ^^d all uik > 0. 
The value d,j is a measure of dissimilarity for observations and 
j, across the variables used in the clustering. For continuous data, 
the Euclidean distance measure dy is expressed as: 



Thus, fuzzy clustering makes use of an iterative algorithm in 
which the function in (2) is minimized through altering the val- 
ues of M,7t. The membership coefficients are in turn calculated as 
(Kaufman and Rousseeuw, 2005): 



1 



Uik ■■ 



2(m- 1) 



(4) 



{(X, - XkY (Xi - Xk)f 



(3) 



In (4), d,i( and djk' represent the distances between observation i 
and clusters k, and k' (k^k'), and m is the membership exponent, 
which will be described in detail below. 

In the context of fuzzy clustering, the amount of overlap 
among clusters across the sample is referred to as the degree of 
fuzziness. The degree of fuzziness allowed in a particular analy- 
sis can be controlled by the researcher through manipulation of 
a quantity known as the membership exponent (ME). This value 
ranges from 1 (minimal fuzziness and equal to K-means) to infin- 
ity, where larger values are associated with a greater degree of 
fuzziness (Gan et al, 2007). Previous studies have recommended 
setting the membership exponent to 2 in many applications in 
practice (Lekova, 2010; Maharaj and D'Urso, 2011). The mem- 
bership exponent chosen by the researcher will depend on how 
much cluster overlap the researcher expects in their data. 

PRIOR RESEARCH APPLICATIONS OF FUZZY CLUSTERING 

Researchers in fields such as medicine, technology (e.g., imagery 
software, computer science), and business already use fuzzy clus- 
tering with some regularity. Specifically, fuzzy clustering has been 
used in gene research for cancer prediction (Alshalalfah and 
Alhajj, 2009), tumor classification (Wang et al., 2003), research 
with MRI data (Ahmed et al., 2002), changes in remote sensing 
images (Ghosh et al, 20 1 1), satellite image retrieval (Ooi and Lim, 
2006), bankruptcy forecasting (De Andres et al., 201 1), computer 
grading of fish products (Hu et al., 1998), and classification of 
management styles (Andrews and Beynon, 201 1). 

Several studies using existing and simulated data have been 
conducted to compare the performance of traditional hard clus- 
tering methods to fuzzy clustering. Based upon these studies, it 
appears that fuzzy clustering can be a useful clustering method 
due to its ability to produce both hard and soft clusters, show the 
relationship of clusters to one another, and deal effectively with 
outliers (Goktepe et al, 2005; Grubesic, 2006). The ability to han- 
dle outliers is an especially important feature of fuzzy clustering 
given that outliers can be a serious problem for other clustering 
algorithms such as K-means (Grubesic, 2006). In the context of 
fuzzy clustering, the outlier's membership is distributed through- 
out the clusters, instead of the outlier being placed into one 
cluster. Unlike fuzzy clustering, K-means clustering would have 
the outlier belong to one cluster, which can skew the structure of 
the clusters (Grubesic, 2006). Additionally, fuzzy clustering has 
been shown to accurately group cases into clusters with real and 
simulated data (Schreer et al., 1998; Goktepe et al., 2005). Schreer 
et al. (1998) found that with artificial data both fuzzy clustering 
and K-means clustering on average misclassified 12% of the data 
and had similar cluster solutions. While fuzzy clustering has been 
shown to produce similar clusters to K-means on simulated data. 
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fuzzy clustering was able to show the strength of membership for 
each cluster as well (Schreer et al., 1998). 

Despite the demonstrated benefits, fuzzy clustering has yet to 
be fully utilized throughout the social and behavioral sciences. It 
does appear, however, that researchers in the social and behavioral 
sciences are aware that not all clusters are discrete. For example, in 
a study of personality types using principal-components analysis, 
Chapman and Goldberg (2011) describe their case cluster struc- 
tures as indistinct or "fuzzy," rather than discrete, when referring 
to the overlapping of clusters in visual representations of their 
data. Although graphical representations can be quite informa- 
tive, it is also important to be able to quantify the degree of such 
overlap. The utilization of fuzzy clustering could be considered a 
more natural approach in many applications, because behavioral 
clusters are not always distinct, and there will be some overlap due 
to the abstract nature of human behavior. 

METHODS 

In order to demonstrate the utility of fuzzy clustering, a compar- 
ison of traditional K-means clustering and fuzzy clustering was 
made using a previously analyzed data set from a study on per- 
fectionism. The FMPS (Frost et al., 1990) was used in a sample 
of undergraduate university students enrolled in educational psy- 
chology and business education courses. Data were collected over 
the course of three academic years, where participation in data 
collection satisfied a course requirement. Collectively, 486 stu- 
dents (304 females, 182 males) participated in the study. A total of 
30 cases had to be deleted due to missing data bringing the final 
sample size to 456. As only a small number of cases had missing 
information, simple listwise deletion was used. The average age 
of the participant was 20.97 (SD = 3.3), and the sample was pre- 
dominately Caucasian (92.6%), consistent with the population 
from which the sample was recruited. 

As mentioned earlier, in a systematic comparison of the factor 
representations of the FMPS, Harvey et al. (2004) provided com- 
pelling evidence in favor of their four-factor solution. These four 
factors included Negative Projections, Achievement Expectations, 
Parental Influences and Organization. In order to compare and 
demonstrate the performance of hard and fuzzy clustering meth- 
ods, a cluster solution generated by K-means, and a cluster fuzzy 
clustering of the four FMPS Harvey factors were run using R 
statistical software, version 2.13.1 (R Development Core Team, 
2010). The fannyO function located in the CLUSTER R pack- 
age was used for fuzzy clustering, and the kmeans() function 
located in the STATS R package for K-means clustering. For both 
the fuzzy clustering and K-means solutions, the default R set- 
tings were used. By default, the K-means clustering algorithm 
in R uses the Hartigan-Wong algorithm (Hartigan and Wong, 
1979), and for fuzzy clustering R uses a Euclidian dissimilarity 
measure with a measurement exponent of 2.0. First, the default 
fuzzy clustering solution was compared to the K-means clustering 
solution in terms of similarity of cluster structure, cluster solu- 
tion fit, and cluster interpretation. Following this comparison, the 
membership exponent for fuzzy clustering was manipulated to 
demonstrate differences in cluster interpretation between fuzzier 
and crisper cluster solutions for the same data. To accomplish 
this comparison, the membership exponent was changed to 1.2 



(which is virtually the smallest membership exponent R will 
allow) to obtain a crisp cluster solution, and the cluster solu- 
tions were again compared in terms of similarity of results. The 
purpose of changing the membership exponents is to show how 
manipulating the degree of fuzziness can provide different but 
meaningful cluster solutions. 

RESULTS 

K-MEANS CLUSTER SOLUTIONS 

Descriptive statistics and psychometric information for the FMPS 
Harvey subscales appear in Table 1. Prior to clustering, multi- 
coUinearity was assessed through use of zero order correlations 
and VIF statistics. Zero order correlations between the Harvey 
subscales ranged from r = 0.032-0.618 with VIF ranging from 
1.186 to 1.861. Together, these results indicate that multicoUinear- 
ity was not a concern, and the clustering proceeded as planned. 

Originally, two different K-means cluster solutions were cre- 
ated: one solution based on the raw subscales and one solution 
using standardized subscales. Because the FMPS Harvey subscales 
have differing numbers of items, it was important to ensure that 
the differential weighting of the variables did not impact the 
interpretation of the cluster solution. After comparing the stan- 
dardized and unstandardized solutions, it was determined that 
both solutions supported the same conceptual profiles, thus the 
cluster solution based on the unstandardized variables was chosen 
for ease of interpretation. 

As K-means clustering is the standard approach, it was per- 
formed first. Initially, however, a hierarchical cluster analysis was 
performed in order to determine the number of clusters for the 
K-means approach. Based on the visual information from the 
dendrogram, three and four cluster solutions were created using 
K-means cluster analysis. Comparison of the two different K- 
means solutions revealed that the four-cluster solution was more 
consistent with the current theoretical models of perfectionism. 
Cluster means for the four-cluster solution appear in Table 2. 
Within-cluster R^ was calculated for each cluster as a measure of 
cluster similarity, ranging from 0.69 to 0.80 indicating moderate 
to high within cluster similarity. 

The clusters listed in Table 2 were tentatively named based 
on the relationships observed among the four Harvey factors 
and are described briefly. First, Externalized Perfectionists (K- 
means cluster 1) were characterized primarily by having low 
organization and achievement expectations with moderate lev- 
els of parental influence and negative projections. The term 
Externalized Perfectionism was selected as it depicts the profile 
of an individual with moderately elevated perfectionism, driven 



Table 1 | Descriptive statistics and properties of the FMPS harvey 
subscales. 





#of 


Min- 


Mean 


Standard 


Cronbach 




Items 


max 




deviation 


alpha 


Negative projections 


12 


12-60 


31.20 


8.44 


0.86 


Acln expectations 


8 


8-40 


28.44 


5.35 


0.85 


Parental influence 


9 


9-45 


24.08 


6.86 


0.89 


Organization 


6 


6-30 


24.00 


4.60 


0.89 
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primarily by external influences (similar to notions of socially 
prescribed perfectionism). Second, the Mixed Perfectionists (K- 
means cluster 2) reported high overall levels of perfectionism, 
with heightened negative projections, achievement expectations 
and parental influence, but reported moderate levels of organi- 
zation. Internalized Perfectionism (K-means cluster 3) included 
individuals with moderate overall perfectionist tendencies who 
demonstrated heightened levels of organization and personally- 
prescribed achievement expectations. Finally, Non-Perfectionists 
(K-means cluster 4) were those individuals in the sample who did 
not demonstrate an elevated degree of any of the Harvey per- 
fectionism factors - as such those in the sample with no clear 
perfectionist tendencies. 

SIMILARITY OF K-MEANS AND FUZZY CLUSTERING SOLUTIONS 

Tables 2, 3 provide information regarding the similarity of the K- 
means and fuzzy clustering solutions. As already discussed above, 
Table 2 presents the cluster means for the original 4 cluster K- 
means solution and the default 4 cluster fuzzy clustering solution. 
Also presented are a 3 cluster fuzzy clustering solution and the 4 
cluster fuzzy clustering solution using a membership exponent of 
1.2, which will be discussed in more detail below. 

As can be seen in Table 2, the cluster means for the 4-cluster 
K-means solution and the 4 cluster fuzzy clustering solution 
show similar patterns indicating similar cluster interpretation. 
K-means cluster 1 (externalized perfectionists) and K-means clus- 
ter 3 (internalized perfectionists) are related closest to cluster 1 of 
the 4-cluster fuzzy cluster solution. According to Table 3, fuzzy 
cluster 1 has the highest percent of participants belonging to the 
externalized perfectionists as defined by K-means (55.4%), but 



also has considerable overlap with the internalized perfectionists 
(42.4%) K-means cluster. The second K-means cluster (mixed- 
perfectionists) was most closely associated with fuzzy cluster 2. 
Fuzzy cluster 2 had the highest percent of participants classified by 
K-means as mixed perfectionists (77.0%) with the second highest 
percent belonging to externalized perfectionists at only 16.7%. 
K-means cluster 4 (non-perfectionists) relates most strongly to 
fuzzy cluster 4, with 78.9% of the cases in this cluster belonging 
to the K-means non-perfectionism cluster. 

Thinking about the big picture provided by the 4 cluster fuzzy 
solution, although the clusters roughly follow the same pattern of 
means as the K-means solution, it is evident that fuzzy clusters 3 
and 4 are very similar indicating that possibly one of the clusters 
is redundant. This prompted investigation into a 3 cluster fuzzy 
clustering solution shown in Table 2 and depicted in Figure 1 . 
Looking at the 3 cluster fuzzy clustering solution it seems that 
fuzzy cluster 3 is very similar in interpretation to clusters 3 and 
4 of the 4 cluster fuzzy clustering solution. The remaining two 



Table 3 | Percentage of fuzzy cluster solutions that belong to 
corresponding k-means clustering solutions with a membership 
exponent of 2.0. 



K-means 1 K-means 2 K-means 3 K-means 4 



Fuzzy cluster 1 


55.4 


2.2 


42.4 


0.0 


Fuzzy cluster 2 


16.7 1 




1 6.3 


0.0 


Fuzzy cluster 3 


26.4 


0.0 


' 44.5 


29.1 


Fuzzy cluster 4 


1.6 


0.0 


19.5 





Table 2 | Means for the K-means and fuzzy clustering hard cluster solutions. 







Neg. Proj 


Achexp 


Parinf 


Org 


M{SD) 


M{SD) 


MiSD) 


M[SD) 




K-MEANS 












Cluster 1 — externalized perfectionists (n = 


103) 


32.78 (3.79) 


25.66 (3.72) 


25.93 (5.03) 


20.62 (4.13) 


Cluster 2 — mixed perfectionists (n = 99) 




42.74 (4.84) 


32.22 (4.07) 


32.50 (5.59) 


24.68 (3.97) 


Cluster 3 — internalized perfectionists (n = 


121) 


30.21 (4.06) 


32.25 (3.22) 


21.17 (4.19) 


2703 (2.96) 


Cluster 4 — non-perfectionists (n = 133) 




22.27 (4.36) 


24.31 (4.35) 


19.04 (3.78) 


23.36 (4.68) 


FUZZY FOUR CLUSTER SOLUTION ^^^^^^^^B ^^^H ^^^H ^^^V^ 


Cluster 1 (n = 92) 




33.40 (4.58) 


29.33 (5.55) 


25.16 (6.34) 


23.04 (5.47) 


Cluster 2 (n = 126) 




41.13 (4.84) 


31.61 (3.89) 


31.10 (5.51) 


24.52 (3.78) 


Cluster 3 (n= 110) 




26.41 (6.01) 


26.95 (6.04) 


20.25 (4.84) 


23.49 (5.67) 


Cluster 4 (n= 128) 




23.94 (3.39) 


25.95 (3.93) 


19.69 (2.81) 


24.63 (3.34) 


FUZZY THREE CLUSTER SOLUTION^^^^^^r ^^^H ^^^^g ^^^m 


Cluster 1 {n= 136) 




40.94 (5.23) 


31.65 (4.07) 


30.88 (5.67) 


24.46 (3.92) 


Cluster 2 (n = 128) 




31.38 (4.52) 


28.58 (5.78) 


23.65 (5.87) 


22.74 (5.76) 


Cluster 3 (n= 192) 




24.17 (4.41) 


26.07 (4.60) 


19.56 (3.59) 


24.52 (3.99) 


FUZZY ME 1.2 CLUSTER SOLUTION^^^^^^^^B ^^^H ^^^M ^^^^ 


Cluster 1 (n= 110) 




32.69 (3.80) 


25.67 (3.72) 


25.75 (4.97) 


20.84(4.13) 


Cluster 2 (n= 100) 




42.68 (4.85) 


32.21 (4.05) 


32.43 (5.60) 


24.71 (3.96) 


Cluster 3 (n= 119) 




29.90 (4.02) 


32.34 (3.19) 


20.94 (4.25) 


26.93 (3.09) 


Cluster 4 (n= 127) 




22.07 (4.34) 


24.20 (4.29) 


19.01 (3.74) 


23.44 (4.76) 



ME, Membership Exponent used for Creation of Fuzzy dusters. lA/hen not specified, default ME of 2 for fuzzy clustering was used. 
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FIGURE 1 1 Visual representation of the 3 cluster fuzzy clustering 
solution. Axes are standardized representations of the principal 


components of the cluster solution. 



clusters of the 3 cluster fuzzy solution appear to map onto the 
K-means clusters 1 and 2. 

Although Table 2 indicates a similar interpretation for the K- 
means and fuzzy clustering solutions, the cluster similarity is not 
absolute. Table 3 presents the percentage of overlap between the 
four K-means clusters and their corresponding fuzzy clusters. As 
can be seen, in general each K-means cluster has a clear match to 
a fuzzy cluster with which it shares a majority of cases. However, 
clusters 1 and 3 do not map onto the K-means clusters as cleanly. 
Regarding K-means cluster 1, the highest correspondence can be 
seen with fuzzy cluster 1, however, they only share 55.4% of their 
cases. K-means cluster 1 also shares 16.7% of its cases with fuzzy 
cluster 2 and 26.4% of its cases with fuzzy cluster 3. K-means 
Cluster 3 shows even less consistency with 44.5% of its cases 
shared fuzzy cluster 3 and 42.4% of its cases shared with fuzzy 
cluster 1. An additional 19.5% of its cases are shared with fuzzy 
cluster 4. 

In summary, the 4 cluster solutions obtained from the K- 
means and the fuzzy clustering methods were similar in interpre- 
tation. However, when a moderate degree of cluster overlap was 
modeled into the clusters (as is the default in fuzzy clustering) 
two of the clusters appeared nearly indistinguishable to the point 
that a 3-cluster solution gave nearly the same information. This 
finding is emphasized in Table 3 with the considerable overlap 
of fuzzy cluster 3 with multiple K-means clusters. Conceptually 
speaking, this speaks to potential group similarity of the individ- 
uals in these clusters. 

FUZZY CLUSTER MEMBERSHIP 

Focusing more fuUy on the fuzzy clustering solution, relation- 
ships between the fuzzy clusters can be investigated by looking 
at the cluster membership. In fuzzy clustering, cluster member- 
ship refers to the degree to which a fuzzy cluster overlaps with 
another fuzzy cluster. The cluster membership of the 4 cluster 
fuzzy solution appears in Table 4. According to Table 4, as would 
be expected, the individuals in each fuzzy cluster belong most 



Table 4 | Summary of clustering membership in percentage for fuzzy 
clustering. 



% cluster 1 % cluster 2 % cluster 3 % cluster 4 



Fuzzy cluster 1 


67.43 


10.14 


14.37 


797 


Fuzzy cluster 2 


14.75 


72.60 


708 


5.57 


Fuzzy cluster 3 


11.76 


3.61 


66.55 


18.09 


Fuzzy cluster 4 


4.99 


2.41 


18.98 





Membership Exponent =2.0 



Strongly to their own cluster than to any other cluster. Consistent 
with the findings from above, fuzzy clusters 2 and 4 appear to be 
more distinct than clusters 1 and 3 belonging to their own clus- 
ters more strongly (72.6% for cluster 2 and 73.5% for cluster 4) 
than fuzzy clusters 1 and 3 (67.4% for cluster 1 and 66.5% for 
cluster 3). 

Further information can be gained by examining which clus- 
ters overlap. Focusing on fuzzy cluster 4, for example, we can see 
that there is 18% overlap with fuzzy cluster 3, indicating that, 
individuals identified with this profile are also similar in charac- 
teristics to individuals in cluster 3. Thinking theoretically about 
these results, conceptual similarities can be seen between clus- 
ters 3 (most closely mapping to Internalized Perfectionists) and 
4 (most closely mapping to Non-Perfectionists). Each of these 
clusters demonstrates lower levels on all of the Harvey factor 
representation for the FMPS (See Table 2). Thus, the conceptual 
relationship between fuzzy clusters 3 and 4, can be seen through 
both the cluster membership and the cluster means. Along the 
same lines, fuzzy cluster 4, shows no practical similarities with 
fuzzy clusters 1 (most closely mapping to externalized perfec- 
tionists) and 2 (most closely mapping to mixed perfectionists) as 
evidenced by both the small percentages of overlap in the cluster 
membership (4.99% with fuzzy cluster 1 and 2.41% with fuzzy 
cluster 2) and the fuzzy cluster means in Table 2. 

MANIPULATION OF CLUSTER OVERLAP 

As previously mentioned, the membership exponent used in 
fuzzy clustering can be changed to increase or decrease the pre- 
ferred amount of cluster overlap in order to model the hypothe- 
sized amount of cluster overlap. In order to investigate the impact 
of the membership exponent on fuzzy clustering solution, a com- 
parison of the K-means cluster and fuzzy cluster solutions with 
a membership exponent of 1.2 was compared. The purpose of 
using a membership exponent of 1.2 is because it allows for less 
overlap in the clusters thus producing crisper clusters similar to 
the K-means results while still allowing for fuzziness within the 
clusters. 

Results of this comparison appear in Tables 2, 5. Whereas pre- 
viously, there were differences between the K-means and fuzzy 
clusters, when the membership exponent was decreased it cre- 
ated crisper clusters with nearly identical results to the K-means 
solution. The cluster means for the ME =1.2 solution shown in 
Table 2 are nearly identical to their K-means cluster counterparts 
and the cluster correspondence shown in Table 5 is more than 
90% for all four clusters indicating strong agreement between the 
K-means and ME 1.2 fuzzy clustering solutions. Fuzzy clusters 2 
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Table 5 | Percentage of fuzzy cluster solutions that belong to 
corresponding k-means clustering solutions with a membership 
exponent of 1.2. 

K-mean clusters 



Cluster 1 Cluster 2 Cluster 3 Cluster 4 



Fuzzy cluster 1 


93.6^ 


2.7 


3.6 


0.00 


Fuzzy cluster 2 


0.00 ~ 




0.00 


0.00 


Fuzzy cluster 3 


0.00 


2.5 


97.5 


0.00 


Fuzzy cluster 4 


0.00 


0.00 


1.0 


_ 99.0 



and 4, which were already identified as corresponding reasonably 
well with the K-means solution when the membership exponent 
was 2.0 are nearly identical (99-100% agreement) to the K-means 
solution once the membership exponent is set to 1.2. Fuzzy clus- 
ters 1 and 3 which exhibited a large degree of overlap with the 
other clusters in the ME 2.0 solution are also nearly identical 
to their K-means cluster counterparts with 93.6% agreement for 
cluster 1 and 97.5% agreement for cluster 3. Thus, the manipula- 
tion of the membership exponent to model more distinct clusters 
with little cluster overlap resulted in a solution nearly identical to 
the original K-means solution. 

DISCUSSION 

While hard clustering methods are dominant in the behavioral 
sciences, there is also great worth in investigating the utility of 
more flexible clustering algorithms. Fuzzy clustering provides one 
such technique as it provides more flexibility in the modeling 
and interpretation of cluster solutions. This study demonstrated 
that fuzzy clustering is also able to show a different perspective 
to the cluster solutions, perhaps, better illuminating the nature of 
relationships between clusters. 

Through the first comparison of the four-cluster K-means 
and fuzzy solutions, we found two unique yet similar cluster 
solutions. The K-means cluster solution created four distinct, 
non-overlapping clusters, whereas, fuzzy clustering created two 
clearly distinct clusters and two clearly overlapping clusters. One 
potential reason for the difference in cluster solutions between the 
two methods is due to the way fuzzy clustering handles ambigu- 
ity in clusters. Unlike K-means clustering, fuzzy clustering allows 
observations to belong to multiple clusters, with the primary 
cluster being the one for which the individual has the largest 
membership coefficient. In this study, the fuzzy clustering solu- 
tion included one cluster with moderate means on all factors, one 
cluster with higher means on all factor, especially negative pro- 
jections, and two clusters with low means on all factors. This will 
essentially create a different, yet meaningful alternative solution 
to that produced by K-means. 

The allowance for overlap among the clusters increases the 
potential utility of fuzzy clustering for gaining insights into the 
nature of the subgroups present in the population, by demon- 
strating more clearly than do K-means solutions the proximity 
and interrelatedness of these groups. From the overlapping clus- 
ters in not only the current study, but in other studies as well 
(e.g., Hwang and Thill, 2009; Ghosh et al., 20 11), fuzzy clustering 



assisted in the understanding of the population structure, and 
the similarities of subgroups therein. This is in part due to the 
flexibility in interpreting the clusters and varying degrees of mem- 
bership that can be shown in fuzzy clustering (Diaz et al, 2006; 
Coppi et al., 2010). Like most areas in psychology, perfectionism 
profiles can be seen as abstract since the profiles will naturally 
have similar attributes in certain areas. Without the use of fuzzy 
clustering we could only speculate how the means impact the rela- 
tionship between the clusters. Fuzzy clustering enabled us and 
other studies (i.e., Grubesic, 2006; Andrews and Beynon, 2011) 
to appropriately handle overlapping characteristics, relate objects 
to more than one cluster, and provide more information about 
the structures of the clusters. 

In addition to demonstrating the potential for finding differ- 
ent clustering solutions using the K-means and fuzzy approaches, 
this study also showed how similar results can also be identi- 
fied by the two clustering methods through manipulation of the 
membership exponent in the fuzzy clustering algorithm. When 
using a very low membership exponent (1.2) the fuzzy algo- 
rithm yielded nearly the same cluster solution as did K-means. 
However, because the membership exponent can be adjusted, 
thereby changing the degree of overlap allowed between clusters, 
fuzzy clustering has the added advantage of being able to inves- 
tigate relationships among clusters (Schreer et al., 1998), as well. 
This is particularly useful in the behavioral sciences, as not all data 
situations will have the same degree of ambiguity. 

Similar to research on perfectionism, there are other areas 
in psychology and the social sciences where modeling of over- 
lapping and ambiguous concepts could be beneficial. Chapman 
and Goldberg's (2011) research on personality provide another 
example. They point to graphical representations suggesting that 
their cluster structures were "fuzzy." Consequently, fuzzy cluster- 
ing may be thought of as a more natural approach to clustering 
such data, as it does not force indistinguishable cases into one 
cluster or another and would be able to model such cluster "fuzzi- 
ness." However, despite the fact researchers are aware that not all 
clusters are discrete, fuzzy clustering has not been utilized to its 
full potential in the social and behavioral sciences. 

LIMITATIONS AND DIRECTIONS FOR FUTURE RESEARCH 

The aim of this study was to highlight the advantages of a rela- 
tively underutilized and potentially useful form of clustering in 
the context of the social sciences. It is important to note, how- 
ever, that the present illustration used the default R settings when 
running these analyses. Although the default settings do have sup- 
port as reasonably robust choices, there are many customizations 
and choices involved in the clustering process, which can impact 
the final cluster result. Such choices include variables to cluster 
on and number of clusters as well as technical clustering details 
such as clustering algorithm, initialization method and dissimi- 
larity measure. See (Kaufman and Rousseeuw, 2005; Everitt et al., 
2011) for further details regarding K-means and fuzzy clustering 
options. Also, like all statistical methods, cluster analysis algo- 
rithms can perform poorly when non-optimal data situations 
arise, such as the presence of outliers, multicoUinearity among the 
variables used to group individuals in the sample, or high skew- 
ness of the clustering variables (Everitt et al., 201 1). Additionally, 
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due to the maximum likelihood estimation method used in esti- 
mation of cluster solutions, optimal solution choice and issues 
with local maxima do arise, making the choice of initial values 
for the algorithm very important (Steinley, 2003). Thus, just like 
any statistical method it is important to consider customization 
options and potential issues when interpreting cluster solutions. 

The purpose of this paper is to provide an illustration of the 
utility of fuzzy clustering to a research question from the social 
sciences. Although this illustration brings up many noteworthy 
points, there is stUl much that is unknown about the optimal 
use of fuzzy clustering in practice. For example, there has been 
little research into optimal usage and accuracy under typical con- 
ditions encountered in the social sciences. There is also little 
research providing advice into choice of membership exponent. 
In addition, it is not yet known how the performances of fuzzy 
and K-means clustering compare under a wide variety of data 
distribution conditions, particularly with respect to classifica- 
tion accuracy, and identification of relationships among clusters. 
All of these are topics worthy of future research endeavors. It is 
hoped, however, that this study has introduced social scientists 
to a clustering technique that can enhance psychological research 
involving the identification of subgroups in the population while 
simultaneously laying the foundation for future studies focusing 
on the benefits, optimal usage, and properties of fuzzy clustering 
in psychology. 
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